Analysis of Temporal-Difference Learning
نویسنده
چکیده
We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counter-examples with regards to the Significance of on-line updating and linearly parameterized function approximators.
منابع مشابه
Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller
One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...
متن کاملInsights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms
متن کامل
Crop Land Change Monitoring Based on Deep Learning Algorithm Using Multi-temporal Hyperspectral Images
Change detection is done with the purpose of analyzing two or more images of a region that has been obtained at different times which is Generally one of the most important applications of satellite imagery is urban development, environmental inspection, agricultural monitoring, hazard assessment, and natural disaster. The purpose of using deep learning algorithms, in particular, convolutional ...
متن کاملThe Effect of Alpha-Lipoic Acid on Learning and Memory Deficit in a Rat Model of Temporal Lobe Epilepsy
Introduction: Epilepsy is a chronic neurological disorder in which patients experience spontaneous recurrent seizures and deficiency in learning and memory. Although the most commonly recommended therapy is drug treatment, some patients do not achieve adequate control of their seizures on existing drugs. New medications with novel mechanisms of action are needed to help those patients whose sei...
متن کاملSome Simulation Results for Emphatic Temporal-Difference Learning Algorithms
This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective. It supplements the latter analysis with simulation results and illustrates the behavior of some of the ETD algorithms using three example problems.
متن کاملOn a convergent off -policy temporal difference learning algorithm in on-line learning environment
In this paper we provide a rigorous convergence analysis of a “off”-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in “online” learning environment. The algorithm considered here is TDC with importance weighting introduced by Maei et al. We support our theoretical results by providing suitable empirical results ...
متن کامل